original 0
f7ede9414083fceab9e63d9100a80b36-Supplemental-Conference.pdf
This pruning algorithm then assigns an importance scorekdRdwlwlk to each weight, and remove the weights receiving the lowest such scores. In Figure 8, we plot the generalization of the family of models each aforementioned algorithm generates as a function of sparsities and training time in epochs. In Section 1, We show that the augmented training algorithm produces VGG-16 models withgeneralization thatisindistinguishable fromthatofmodels thatpruning withlearning rate rewinding produces. We refer to the topK% of training examples whose training loss improves the most during pruning as thetop-improved examples. To examine the influence of these top-improved examples ongeneralization, for each sparsity pruning reaches, we train twodense models ontwo datasets respectively: a). the original training dataset excluding the top-improved examples at the specifiedsparsity,whichwedenoteasTIE(Top-ImprovedExamples);b).
IFFair: Influence Function-driven Sample Reweighting for Fair Classification
Yang, Jingran, Zhang, Min, Zhang, Lingfeng, Wang, Zhaohui, Zhang, Yonggang
Because machine learning has significantly improved efficiency and convenience in the society, it's increasingly us ed to assist or replace human decision-making. However, the data-based pa ttern makes related algorithms learn and even exacerbate potential bia s in samples, resulting in discriminatory decisions against certain unp rivileged groups, depriving them of the rights to equal treatment, thus damagi ng the social well-being and hindering the development of related applic ations. Therefore, we propose a pre-processing method IFFair based on the influence function. Compared with other fairness optimization appro aches, IFFair only uses the influence disparity of training samples on diffe rent groups as a guidance to dynamically adjust the sample weights durin g training without modifying the network structure, data features and decision boundaries. To evaluate the validity of IFFair, we conduct e xperiments on multiple real-world datasets and metrics. The experimenta l results show that our approach mitigates bias of multiple accepted metri cs in the classification setting, including demographic parity, equaliz ed odds, equality of opportunity and error rate parity without conflicts. It al so demonstrates that IFFair achieves better trade-off between multi ple utility and fairness metrics compared with previous pre-processing me thods.
- Europe > Switzerland > Basel-City > Basel (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > Middle East > Jordan (0.04)
"The Dentist is an involved parent, the bartender is not": Revealing Implicit Biases in QA with Implicit BBQ
Wagh, Aarushi, Srivastava, Saniya
Existing benchmarks evaluating biases in large language models (LLMs) primarily rely on explicit cues, declaring protected attributes like religion, race, gender by name. However, real-world interactions often contain implicit biases, inferred subtly through names, cultural cues, or traits. This critical oversight creates a significant blind spot in fairness evaluation. We introduce ImplicitBBQ, a benchmark extending the Bias Benchmark for QA (BBQ) with implicitly cued protected attributes across 6 categories. Our evaluation of GPT-4o on ImplicitBBQ illustrates troubling performance disparity from explicit BBQ prompts, with accuracy declining up to 7% in the "sexual orientation" subcategory and consistent decline located across most other categories. This indicates that current LLMs contain implicit biases undetected by explicit benchmarks. ImplicitBBQ offers a crucial tool for nuanced fairness evaluation in NLP.
- Banking & Finance (0.70)
- Health & Medicine > Therapeutic Area > Immunology (0.70)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.48)
A Novel Multimodal RUL Framework for Remaining Useful Life Estimation with Layer-wise Explanations
Estimating the Remaining Useful Life (RUL) of mechanical systems is pivotal in Prognostics and Health Management (PHM). Rolling-element bearings are among the most frequent causes of machinery failure, highlighting the need for robust RUL estimation methods. Existing approaches often suffer from poor generalization, lack of robustness, high data demands, and limited interpretability. This paper proposes a novel multimodal-RUL framework that jointly leverages image representations (ImR) and time-frequency representations (TFR) of multichannel, nonstationary vibration signals. The architecture comprises three branches: (1) an ImR branch and (2) a TFR branch, both employing multiple dilated convolutional blocks with residual connections to extract spatial degradation features; and (3) a fusion branch that concatenates these features and feeds them into an LSTM to model temporal degradation patterns. A multi-head attention mechanism subsequently emphasizes salient features, followed by linear layers for final RUL regression. To enable effective multimodal learning, vibration signals are converted into ImR via the Bresenham line algorithm and into TFR using Continuous Wavelet Transform. We also introduce multimodal Layer-wise Relevance Propagation (multimodal-LRP), a tailored explainability technique that significantly enhances model transparency. The approach is validated on the XJTU-SY and PRONOSTIA benchmark datasets. Results show that our method matches or surpasses state-of-the-art baselines under both seen and unseen operating conditions, while requiring ~28 % less training data on XJTU-SY and ~48 % less on PRONOSTIA. The model exhibits strong noise resilience, and multimodal-LRP visualizations confirm the interpretability and trustworthiness of predictions, making the framework highly suitable for real-world industrial deployment.
- Asia > China > Anhui Province > Hefei (0.04)
- North America > United States > Texas > Schleicher County (0.04)
- Asia > Russia > Far Eastern Federal District > Magadan Oblast > Magadan (0.04)
- Asia > China > Zhejiang Province > Ningbo (0.04)
Towards Robust and Fair Next Visit Diagnosis Prediction under Noisy Clinical Notes with Large Language Models
A decade of rapid advances in artificial intelligence (AI) has opened new opportunities for clinical decision support systems (CDSS), with large language models (LLMs) demonstrating strong reasoning abilities on timely medical tasks. However, clinical texts are often degraded by human errors or failures in automated pipelines, raising concerns about the reliability and fairness of AI-assisted decision-making. Y et the impact of such degradations remains under-investigated, particularly regarding how noise-induced shifts can heighten predictive uncertainty and unevenly affect demographic subgroups. We present a systematic study of state-of-the-art LLMs under diverse text corruption scenarios, focusing on robustness and equity in next-visit diagnosis prediction. To address the challenge posed by the large diagnostic label space, we introduce a clinically grounded label-reduction scheme and a hierarchical chain-of-thought (CoT) strategy that emulates clinicians' reasoning. Our approach improves robustness and reduces subgroup instability under degraded inputs, advancing the reliable use of LLMs in CDSS.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
trained from scratch (Section 4.5), while most results of other papers or model zoo are fine-tuned from a pre-trained
We appreciate the reviewers for the constructive comments on this paper. One common concern is our baseline for RetinaNet/Mask-RCNN is not strong. ImageNet Pre-Training." is comparable with our baseline (39.5% vs 39.24%). R1Q1: How do the latencies change on GPU? R1Q2: The improvements are not large. R1Q3: Compare to prior BSS search methods like POP[22] in T able 1.
Software Defined Vehicle Code Generation: A Few-Shot Prompting Approach
Nguyen, Quang-Dung, Tran, Tri-Dung, Chu, Thanh-Hieu, Tran, Hoang-Loc, Cheng, Xiangwei, Slama, Dirk
The emergence of Software-Defined Vehicles (SDVs) marks a paradigm shift in the automotive industry, where software now plays a pivotal role in defining vehicle functionality, enabling rapid innovation of modern vehicles. Developing SDV-specific applications demands advanced tools to streamline code generation and improve development efficiency. In recent years, general-purpose large language models (LLMs) have demonstrated transformative potential across domains. Still, restricted access to proprietary model architectures hinders their adaption to specific tasks like SDV code generation. In this study, we propose using prompts, a common and basic strategy to interact with LLMs and redirect their responses. Using only system prompts with an appropriate and efficient prompt structure designed using advanced prompt engineering techniques, LLMs can be crafted without requiring a training session or access to their base design. This research investigates the extensive experiments on different models by applying various prompting techniques, including bare models, using a benchmark specifically created to evaluate LLMs' performance in generating SDV code. The results reveal that the model with a few-shot prompting strategy outperforms the others in adjusting the LLM answers to match the expected outcomes based on quantitative metrics.
- Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.05)
- Europe > Germany (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (3 more...)
Abstain Mask Retain Core: Time Series Prediction by Adaptive Masking Loss with Representation Consistency
Liang, Renzhao, Xu, Sizhe, Xie, Chenggang, Chen, Jingru, Ren, Feiyang, Yang, Shu, Yabe, Takahiro
Time series forecasting plays a pivotal role in critical domains such as energy management and financial markets. Although deep learning-based approaches (e.g., MLP, RNN, Transformer) have achieved remarkable progress, the prevailing "long-sequence information gain hypothesis" exhibits inherent limitations. Through systematic experimentation, this study reveals a counterintuitive phenomenon: appropriately truncating historical data can paradoxically enhance prediction accuracy, indicating that existing models learn substantial redundant features (e.g., noise or irrelevant fluctuations) during training, thereby compromising effective signal extraction. Building upon information bottleneck theory, we propose an innovative solution termed Adaptive Masking Loss with Representation Consistency (AMRC), which features two core components: 1) Dynamic masking loss, which adaptively identified highly discriminative temporal segments to guide gradient descent during model training; 2) Representation consistency constraint, which stabilized the mapping relationships among inputs, labels, and predictions. Experimental results demonstrate that AMRC effectively suppresses redundant feature learning while significantly improving model performance. This work not only challenges conventional assumptions in temporal modeling but also provides novel theoretical insights and methodological breakthroughs for developing efficient and robust forecasting models.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > Germany (0.04)
- Asia > China > Guangxi Province > Nanning (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Hearing the Order: Investigating Selection Bias in Large Audio-Language Models
Lin, Yu-Xiang, Li, Chen-An, Wei, Sheng-Lun, Chen, Po-Chun, Chen, Hsin-Hsi, Lee, Hung-yi
Large audio-language models (LALMs) are often used in tasks that involve reasoning over ordered options. An open question is whether their predictions are influenced by the order of answer choices, which would indicate a form of selection bias and undermine their reliability. In this paper, we identify and analyze this problem in LALMs. We demonstrate that no model is immune to this bias through extensive experiments on six LALMs across three widely used benchmarks and their spoken counterparts. Shuffling the order of answer options can cause performance fluctuations of up to 24% and even change model rankings, raising concerns about the reliability of current evaluation practices. We also study permutation-based strategies and show that they can mitigate bias in most cases. Our work represents the first systematic investigation of this issue in LALMs, and we hope it raises awareness and motivates further research in this direction.